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In the present paper we propose a new estimator of entropy based on smooth estima- 
tors of quantile density. The consistency and asymptotic distribution of the proposed 
estimates are obtained. As a consequence, a new test of normality is proposed. A 
small power comparison is provided. A simulation study for the comparison, in terms 
of mean squared error, of all estimators under study is performed. 
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1 Introduction and estimation 

Let X be a random variable [r.v.] with cumulative distribution function [cdf] F(x) := 
¥(X < x) for ieR and a density function /(•) with respect to Lebesgue measure on 
R. Then its differential (or Shannon) entropy is defined by 
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where dx denotes Lebesgue measure on R. We assume that H(X) is properly defined by 
the integral (1.1), in the sense that 



The concept of differential entropy was introduced in Shannon's original paper [Shannon 
(1948)]. Since this early epoch, the notion of entropy has been the subject of great theo- 
retical and applied interest. Entropy concepts and principles play a fundamental role in 
many applications, such as statistical communication theory [Gallager (1968)], quantiza- 
tion theory [Renyi (1959)], statistical decision theory [Kullback (1959)], and contingency 
table analysis [Gokhale and Kullback (1978)]. Csiszar (1962) introduced the concept of 
convergence in entropy and showed that the latter convergence concept implies conver- 
gence in C\. This property indicates that entropy is a useful concept to measure "closeness 
in distribution" , and also justifies heuristically the usage of sample entropy as test statistics 
when designing entropy-based tests of goodness-of-fit. This line of research has been pur- 
sued by Vasicek (1976), Prescott (1976), Dudewicz and van der Meulen (1981), Gokhale 
(1983), Ebrahimi et al. (1992) and Esteban et al. (2001) [including the references therein]. 
The idea here is that many families of distributions are characterized by maximization 
of entropy subject to constraints (see, e.g., Jaynes (1957) and Lazo and Rathie (1978)). 
There is a huge literature on the Shannon's entropy and its applications. It is not the pur- 
pose of this paper to survey this extensive literature. We can refer to Cover and Thomas 
(2006) (see their Chapter 8), for a comprehensive overview of differential entropy and their 
mathematical properties. 

In the literature, Several proposals have been made to estimate entropy. Dmitriev and Tarasenko 
(1973) and Ahmad and Lin (1976) proposed estimators of the entropy using kernel-type 
estimators of the density /(•). Vasicek (1976) proposed an entropy estimator based on 
spacings. Inspired by the work of Vasicek (1976), some authors [van Es (1992), Correa 
(1995) and Wieczorkowski and Grzegorzewski (1999)] proposed modified entropy estima- 
tors, improving in some respects the properties of Vasicek's estimator (see Section 4 below). 
The reader finds in Beirlant et al. (1997) detailed accounts of the theory as well as surveys 
for entropy estimators. 

This paper aims to introduce a new entropy estimator and obtains its asymptotic proper- 
ties. Comparison Simulations indicate that our estimator produces smaller mean squared 
error than the other well-known competitors considered in this work. As a consequence, 
we propose a new test of normality. First, we introduce some definitions and notations. 
For each distribution function F(-), we define the quantile function by 



H(X)\ < oo. 



(1.2) 



Q{t) : = inf{x : F(x) >t}, < t < 1. 



Let 



Xp := sup{x : F(x) = 0} and x 



,F . 



inf{x : F(x) = 1}, — oo < x F < x F < oo. 
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We assume that the distribution function F(-) has a density /(■) (with respect to Lebesgue 
measure on R), and that f(x) > for all x G (xp,x F ). Let 



q(x) := dQ(x)/dx = l/f(Q(x)), < x < 1, 

be the quantile density function [qdf]. The entropy H(X), defined by (1.1), can be ex- 
pressed in the form of quantile-density functional as 

H(X)= [ log ( ^Q(x) ) dx = [ \og(q(x))dx. (1.3) 

J [0,1] V« x / J [0,1] 

The Vasicek's estimator was constructed by replacing the quantile function Q(-) in (1.3) by 
the empirical quantile function and using a difference operator instead of the differential 
one. The derivative (d/dx)Q(x) is then estimated by a function of spacings. 

In this work, we construct our estimator of entropy by replacing g(-), in (1.3), by an appro- 
priate estimator q n {-) of q(-). We shall consider the kernel-type estimator of q(-) introduced 
by Falk (1986) and studied by Cheng and Parzen (1997). Our choice is motivated by the 
well asymptotic behavior properties of this estimator. Cheng and Parzen (1997) were es- 
tablished the asymptotic properties of q n {-) on all compact U C]0,1[, which avoids the 
boundary problems. Since the entropy is definite as an integral on ]0, 1[ of a functional of 
q(-), it is not suitable to substitute directly q n {-) in (1.3) to estimate H(X). To circumvent 
the boundary effects, we will proceed as follows. We set for small e g]0, l/2[, 



H e (X) := ehg (q(e)) +e\og (q(l - e)) + J log (q(x))dx. 

In view (1.3), we can see that 

\H(X)-H £ (X)\ = o( V (e)), (1.4) 

where i](e) — > 0, as e — > 0. The choice of e close to zero guaranteed the closeness of H £ (X) 
to H(X), then the problem of the estimation of H(X) is reduced to estimate H e (X). 
Given an independent and identically distributed random [i.i.d.] sample X%, . . . ,X n , and 
let e g]0, l/2[, an estimator of H £ (X) can be defined as 

H e . n (X) = e log (q n (£))+£ log (q n (1-e)) + J log (q n (x))dx, (1.5) 

where the estimator q n {-) of the qdf q(-) is defined as follows. Let < • ■ • < X n . n denote 
the order statistics of X±, . . . , X n . The empirical quantile function Q n (-) based upon these 
random variables is given by 

Qn{t) := X k;n , (k - l)/n <t< k/n, k = l,...,n. (1.6) 
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Let {K n (t, x), (t, x) G [0, 1] x]0, l[} n >i be a sequence of kernels and {/i n ( - )}n>i a sequence 
of er-finite measures on [0, 1]. A smoothed version of Q n (-) (see, e.g., Cheng and Parzen 
(1997)) can be defined as 



Qn(t) := / Q n (x)K n (t,x)dfi n (x), £g]0,1[. 
Jo 



Finally, we estimate q(-) by 

Qn(t) := -Q n (t) = - Q n (x)K n (t,x)di2 n (x), te]0,l[. (1.7) 



dt dt J 

Clearly, in order to obtain a meaningful qdf estimator in this way, the sequence of kernels 
K n (-,-) must satisfy certain differentiability conditions, and together with the sequence 
of measures fi n {'), must satisfy certain variational conditions. These conditions will be 
detailed in the next section. A familiar example is the convolution-kernel estimator 



Ut) := j t f K l Q n (x)K {^Y^j dx, t g]0, 1[, 



where K(-) denotes a kernel function, namely a measurable function integrating to 1 on 
R, and has bounded derivative, and {h n } n >i is a sequence of positive reals fulfills h n — > 
and nh n — > oo as n — > oo. In this case, Csorgo and Revesz (1984) and Csorgo et al. (1991) 
define 

fn/(n+l) 



rn/(n+l) /f_ T \ 

q n (t) := K 1 d Qn{x) 

Jl/(n+l) \ n n J 

= h-^K^^jiXi+^-Xitn), te[0,l]. 
Calculations using summation by parts show that, for all t e]0, 1[, 

In the sequel, we shall consider the general family of qdf estimators defined in (1.7). 



The remainder of the present article is organized as follows. The consistency and normal- 
ity of our estimator are discussed in the next section. Our arguments, used to establish 
the asymptotic normality of our estimator, make use of an original application of the in- 
variance principle for the empirical quantile process. In Section 3.1, we discuss briefly the 
smoothed version of Parzen's entropy estimator. In Section 4, we investigate the finite- 
sample performance of the newly proposed estimator and compare the latter with the 
performances of existing estimators. In section 5, a new test of normality is proposed and 
compared with other competitor. Some concluding remarks and possible future develop- 
ments are mentioned in Section 6. To avoid interrupting the flow of the presentation, all 
mathematical developments are relegated to Section 7. 
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2 Main results 



Throughout U(e) := [e, 1 — e], for an e e]0, l/2[ is arbitrarily fixed (free of the sample 
size n). The following conditions are used to establish the main results. We recall the 
notations of Section 1. 

(Q.l) The quantile density function q(-) is twice differentiable in ]0, 1[. Moreover, 

< min{g(0), q(l)} < oo; 

(Q.2) There exists a constant q > such that 

sup {t(l - t) \ J{t)\} <c, 
te]o,i[ 

where J(t) := rflog {q(t)}/dt is the score function; 

(Q.3) Either q(0) < oo or g(-) is nonincreasing in some interval (0, t*), and either q(l) < oo 
or q(-) is nondecreasing in some interval (t*,l), where < i* < t* < 1. 

We will make the following assumptions on the sequence of kernels K n (-, •). 

(K.l) For each n > 1 and each G Z7(e)x]0, 1[, K n (t,x) > 0, and for each t e 

Jg 1 K n (t,x)dfx n (x) = 1; 



(K.2) There is a sequence <5„ 4- such that 



sup 

tG!7(£) 



t— 5„ 



— > 0, as n — )• oo; 



(K.3) For any function g(-), that is at least three times differentiable in ]0, 1[ 

9n(t) := / g(x)K n (t,x)dfi n (x), 
Jo 

is differentiable in t on Z7(e), and 



sup 

tei/(£) 



sup 

tet/(e) 



5-(t) - / g{x)K n (t, x) dfi n (x) 



g(x)K n (t,x) dfi n (x] 



O (n' a ) , a > 0; 



O {n-P) 



0>O. 



Note that the conditions (Q. 1-2-3) and (K. 1-2-3), which we will use to establish the con- 
vergence in probability of H e . n (X), match those used by Cheng and Parzen (1997) to 
establish the asymptotic properties of <f n (-). 

Our result concerning consistency of H £ . n (X) defined in (1.5), where the qdf estimator is 
given in (1.7), is as follows. 
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Theorem 2.1 Let X 1; . . . , X n be i.i.d. r.v. 's with a quantile density function q(-) fulfilling 
(Q. 1-2-3). Let K n (-,-) fulfill (K.l-2-3). Then, we have, 

\H £ , n (X) - H(X)\ = ¥ (n- l l 2 M{q- K n ) + + 77(e)) , (2.1) 

where 



M(q;K n ) 


■= MgM„ 


(l)^/6 n \og6-i + M q , + V ' M n {q*)R' n {l) + 
/ \g(x)K n (u,x)\dn n (x), 


M n (g) 


:= sup 




ueU(e) . 


Jo 


K(g) 


:= sup 


/ \g(x)K n {u,x)\dfi n {x), 




ueU(e) . 


J[0,l}\U(e+5 n ) 


M g 


:= sup 

ueU(e) 





and S n is given in condition (K.2). 

The proof of Theorem 2.1 is postponed to the Section 7. 

To establish the asymptotic normality of H e . n (X), we will use the following additional 
condition. 

(Q.4) Ejlog 2 (q(F(X)))} < oo. 
Let, for all e e]0, l/2[, 

Ejlog 2 (g(F(X)))} - jelog 2 ( q (e))+elog 2 (q(l - e)) + jf 'log 2 (g(x))da;J = o(i?(e) 
where i?(e) — > 0, as e — > 0. 

The main result, concerning the normality of H e . n (X), to be proved here may now be 
stated precisely as follows. 

Theorem 2.2 Assume that the conditions (Q. 1-2-3-4) and (K.l-2-3) hold with a > 1/2 
and (3 > 1/2 in (K.3). Then, we have, 

yfti(H e . n (X) - H e {X)) - ij n (e) = P (^2e log^ 1 }^ + o P (l), (2.2) 

where 

ip n {e) : = / {q\x)/q{x))B n {x)dx, 

JU(e) 

is a centered Gaussian random variable with variance equal to 
Var(i; n (e)) = Var {log } + o(#(e) + tj(e)) . 
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The proof of Theorem 2.2 is postponed to the Section 7. 

Remark 2.3 Condition (Q.4) is extremely weak and is satisfied by all commonly encoun- 
tered distributions including many important heavy tailed distributions for which the mo- 
ments do not exists, the interested reader may refer to Song (2000) and the references 
therein. 

Remark 2.4 We mention that the condition (K.3) is satisfied, when a, (3 > 1/2, for 
the difference kernels K n (t , x) d/i n (x) = h^ki^t — x)/h n )dx with h n = 0(n~ u ) where 
1/4 < v < 1. A typical example for differences kernels satisfying theses conditions is the 
gaussian kernel. 



3 The smoothed Parzen estimator of entropy 

We mention that the notations of this section are similar to that used in Parzen (1979) 
and changes have been made in order to adopt it to our setting. Given a random sample 
Xi, . . . ,X n of a continuous random variable X with distribution function F(-). In this 
section we will work under the following hypothesis, for all iGK, 



jr* : F(x) = F c 



x — fl 



a 

where \x and a are parameters to be estimated, it is convenient to transform Xj to 

X — fa 



Ui 



where fx n and a n are efficient estimators of /x and a. It is more convenient to base tests 
of the hypothesis on the deviations from identity function t of the empirical distribution 
function of U\, . . . , U n . Let denote by D n ^(t), < t < 1 the empirical quantile function 
of Ui, . . . , U n ; it can be expressed in terms of the sample quantile function Q n (-) of the 
original data Xi, . . . , X n by 



0~r, 

where 

Q n {t) —n(-—t] Xj_i ;n + n(t — - — - J X i;n , for - — - < t < -, i = 1, . . . , n. 
\n J \ n J n n 

In our case, we may consider the smoothed version of the quantile density given in Parzen 
(1979) and defined by 

d n , (t) := ^D n>0 (t) = f ( MLZ$A ±Q n (t)±. 

at \ o~„ / at a„ 
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Consequently, define 

d n (t) := MQo(t))±Q n {t) : 1 



dt <T ,n 
where 



f 1 d ~ 

*o, n --= J fo(Qa(t))-Q n (t)dt. 



This is an alternative approach to estimating q(-) when one desires a goodness of fit test 
of a location scale parametric model, we refer to Parzen (1991) for more details. We 
mention that in (Parzen, 1979, Section 7.), the smoothed version of d n (t) is defined, for 
the convolution-kernel, by 

dn(t) ■= J d n (u)^-K (-j^—^j du - 
This suggests the following estimator of entropy 

H n {X) := [ log(d n (t))dt. (3.1) 



Jo 

Under similar conditions to that of Theorem 2.1, we could derive, under Jff*, that, as 
n — > oo 

\H n (X)-H(X)\ = o ¥ (l). (3.2) 

For more details on goodness of fit test in connection with Shannon's entropy, the interested 
reader may refer to Parzen (1991). Note that the normality is an example of location-scale 
parametric model. 

Remark 3.1 Recall that there exists several ways to obtain smoothed versions of d n (-). 
Indeed, we can choose the following smoothing method. Keep in mind the following defi- 
nition 

d d f~^~ 

= di® n ^ = dt J Q n ( x ^ Kn (*' x ) rf / i «( a; )' t G ]°' 1 l 

Then, we can use the following estimator 

d* n (t) := fo(Qo(t))Ut)J-- 

°~0,n 

Finally, we estimate the entropy under M'* 

H* n {X):= f \og{d* n {t))dt. (3.3) 
J o 

Under conditions of Theorem 2.1, we could derive, under M'* , that, as n — >■ oo 

\H* n (X)-H(X)\ = o ¥ (l). (3.4) 
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It will be interesting to provide a complete investigation of H n (X) and H*(X) and study 
the test of normality of statistics based on them. This would go well beyond the scope of 
the present paper. 

Remark 3.2 The nonparametric approach, that we use to estimate the Shannon's entropy, 
treats the density parameter-free and thus offers the greatest generality. This is in contrast 
with the parametric approach where one assumes that the underlying distribution follows 
a certain parametric model, and the problem reduces to estimating a finite number of 
parameters describing the model. 



4 A comparison study by simulations 

Assuming that Xx, . . . ,X n , is the sample, the estimator proposed by Vasicek (1976) is 
given by 



i n 



=1 

where m is a positive integer fulfilling m < n/2. Vasicek proved that his estimator is 
consistent, i.e., 

Hmn ~^ H{X), as n — > oo, m — > oo and m/n — > 0. 

Vasicek's estimator is also asymptotically normal under appropriate conditions on m and 
n. 

van Es (1992) proposed a new estimator of entropy given by 

n—m 



hZT ] ■= — —^ogl^ix^-x^}] 

4=1 X ' 

n 1 

+ Z + l g( m ) - log(n + 1). 

K 



k=m 

van Es (1992) established, under suitable conditions, the consistency and asymptotic nor- 
mality of this estimator. 

Correa (1995) suggested a modification of Vasicek's estimator. In order to estimate the 
density /(•) of F(-) in the interval (Xj_ m;n , X i+m . n ) he used a local linear model based on 
2m + 1 points: 

F(Xj. n ) = a + pXj. n , j = m - i, . . . ,m + i. 

This produces a smaller mean squared error estimator. The Correa's estimator is defined 
by 



m>n ' n h { nr;™- m {x,, n -x {i) } 
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where 

i+ra 



X(i) ' 2m ■ 

j=i—m 

Here, m < n/2 is a positive integer, X i;n = X 1;n for z < 1 and X i;n = X n;ra for i > n. By 
simulations, Correa showed that, for some selected distributions, his estimator produces 
smaller mean squared error than Vasicek's estimator and van Es' estimator. 
Wieczorkowski and Grzegorzewski (1999) modified the Vasicek's estimator by adding a 
bias correction. Their estimator is given by 



(2m\ 
1 - —J V(2m) 



2 



+^(n + 1) y^^(i + m- 1), 



n . 

where *f?(x) is the digamma function defined by (see e.g., Gradshteyn and Ryzhik (2007)) 

= ciiogr(x) r(x) 

For integer arguments, we have 
fc-i 



2 



where 7 = 0.57721566490 ... is the Euler's constant. 

A series of experiments were conducted in order to compare the performance of our estima- 
tor, in terms of efficiency and robustness, with the following entropy estimators : Vasicek's 
estimator, van Es' estimator, Correa's estimator and Wieczorkowski- Grzegorewski's esti- 
mator. We provide numerical illustrations regarding the mean squared error of each of the 
above estimators of the entropy H(X). The computing program codes were implemented 
in R. We have considered the following distributions. For each distribution, we give the 
value ofH(X). 

• Standard normal distribution N(0, 1): 



H(X) = log(fT^TC) = log(V27re). 

Uniform distribution: 

H(X) = 0. 

Weibull distribution with the shape parameter equal to 2 and the scale parameter 
equal to 0.5: 

H(X) = -0.09768653. 
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Recall that the entropy for Weibull distribution, with the shape parameter k > 
and the scale parameter A > 0, is given by 

*(J0= 7 (l-i) + ]ogg) + l. 

• Exponential distribution with parameter A = 1: 

H(X) = 1 - log(A) = 1. 

• Student's t-distribution with the number of degrees of freedom equal to 1, 3 and 5: 

H(X) = 2.53102425, (degrees of freedom equal to 1 (Cauchy distribution)), 
H(X) = 1.77347757, (degrees of freedom equal to 3), 
H(X) = 1.62750267, (degrees of freedom equal to 5). 

Keeping in mind that the entropy for Student's t-distribution, with the number of 
degrees of freedom u, is given by 

where ty(-) is digamma function. 

For sample sizes n = 10, n = 20 and n = 50; 5000 samples were generated. All the 
considered spacing-based estimators depend on the parameter m < n/2; the optimal 
choice of m given to the sample size n is still an open problem. Here, we have chosen 
the parameter m according to Correa (1995), where m = 3 for n = 10 and n = 20, 
and m = 4 for n = 50. For our estimator H e>n (X), we have used the standard gaussian 
kernel, and the choice of the bandwidth is done by numerical optimization of the MSE of 
H £ - n (X) with respect to h n . In Tables 1, 2 and 3 we have considered normal distribution 
iV(0, 1) with sample sizes n = 10, n = 20 and n = 50. In Tables 4-9, we have considered 
the uniform distribution, Weibull distribution, exponential distribution with parameter 1, 
Student t-distribution with parameter 3 and 5 and Cauchy distribution, respectively, and 
sample size n = 50. In all cases, and for each considered estimator, we compute the bias, 
variance and MSE by Monte-Carlo through the 5000 replications. 

From Tables 1-9, we can see that our estimator works better than all the others, in the 
sense that the MSE is smaller. We think that the simulation results may be substantially 
ameliorated if we choose for example the Beta or Gamma kernel, which have the advantage 
to take into account the boundary effects. It appears that our estimator, based on the 
smoothed quantile density estimate, behaves better than the traditional ones in term of 
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Estimate 


bias 


Variance 


MSE 


H (V) 

1J -m,n 


0.85912656 


-0.55981198 


0.07087153 


0.38419010 


Tj{Van) 
11 m,n 


1.18654642 


-0.23239211 


0.08296761 


0.13689074 


rr(Cor) 
iim,n 


1.03589137 


-0.38304716 


0.07197488 


0.21862803 


tt(WG) 
11 m,n 


1.28060252 


-0.13833601 


0.07087153 


0.08993751 


H £]n {X) 


1.33450763 


-0.08443091 


0.07002134 


0.07707990 



Table 1: Results for n = 10, m = 3, h n = 0.157, Normal distribution iV(0, 1) 





Estimate 


bias 


Variance 


MSE 


H (v) 

Hm,n 


1.10631536 


-0.31262317 


0.03182798 


0.12952939 


Tr{Van) 
11 m,n 


1.24420922 


-0.17472931 


0.03532923 


0.06582424 


tt(Cot) 
llm,n 


1.24195337 


-0.17698517 


0.03230411 


0.06359555 


tt(WG) 
11 m,n 


1.36008222 


-0.05885632 


0.03182798 


0.03526021 


H £]n {X) 


1.43214893 


0.01321040 


0.02920368 


0.02934899 



Table 2: Results for n = 20, m = 3, h n = 0.081, Normal distribution JV(0, 1) 





Estimate 


bias 


Variance 


MSE 


H (V) 


1.26025240 


-0.15868613 


0.01126393 


0.03643395 


Tj{Van) 
n m,n 


1.29349009 


-0.12544844 


0.01210094 


0.02782615 


tt(Cot) 


1.35839575 


-0.06054278 


0.01148899 


0.01514293 


tt(WG) 
n m,n 


1.40287628 


-0.01606226 


0.01126393 


0.01151066 


He;n(X) 


1.42053167 


0.00159314 


0.01085020 


0.01084189 



Table 3: Results for n = 50, m = 4, h n = 0.0333, Normal distribution N(0, 1) 





Estimate 


bias 


Variance 


MSE 


H (v) 


-0.142936202 


-0.142936202 


0.001730435 


0.022161020 


Tr(Van) 


-0.000811817 


-0.000811817 


0.003428982 


0.003429298 


tt(Cot) 
ii m,n 


-0.048455430 


-0.048455430 


0.001780981 


0.004128732 


tt(WG) 
11 m,n 


-0.0003123278 


-0.0003123278 


0.0017304350 


0.0017303596 




-0.001503714 


-0.001503714 


0.001005806 


0.001007967 



Table 4: Results for n = 50, m = 4, h n = 0.522, Uniform distribution 



efficiency. 

We turn now to compare robustness property of the above estimators of the entropy H(X). 
The robustness here is to be stand versus contamination and not in terms of influence 
function or break-down points which make sense only under parametric or semiparametric 
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Estimate 


bias 


Variance 


MSE 


H (V) 

11 m,n 


-0.260890919 


-0.163204390 


0.009863565 


0.036497265 


Tj{Van) 


-0.20651205 


-0.10882552 


0.01139025 


0.02323097 


rr(Cor) 


-0.163758480 


-0.066071951 


0.009953612 


0.014317124 


tt(WG) 


-0.118267044 


-0.020580515 


0.009863565 


0.010285150 


H £;n (X) 


-0.08547838 


0.01220815 


0.01945246 


0.01959761 



Table 5: Results for n = 50, m = 4, h n = 0.6104, Weibull distribution with the shape 
parameter equal to 2 and the scale parameter equal to 0.5 





Estimate 


bias 


Variance 


MSE 


H (v) 

a m,n 


0.85568859 


-0.14431141 


0.02233983 


0.04316114 


Tj(Van) 
11 m,n 


0.92199497 


-0.07800503 


0.02313389 


0.02921405 


tt(Cot) 


0.95566152 


-0.04433848 


0.02266589 


0.02462726 


rriWG) 


0.998312466 


-0.001687534 


0.022339826 


0.022338205 


H £]n (X) 


1.31010002 


0.31010002 


0.06795533 


0.16410376 



Table 6: Results for n = 50, m = 4, h n = 0.712, Exponential distribution with parameter 
1 





Estimate 


bias 


Variance 


MSE 


rr(V) 
£1 m,n 


1.63721785 


-0.13625972 


0.02903662 


0.04759752 


rr(Van) 


1.58175430 


-0.19172327 


0.02360710 


0.06036019 


tt(Cov) 
-l-lm,n 


1.74658234 


-0.02689523 


0.03048823 


0.03120548 


tt(WG) 
n m ^n 


1.779841726 


0.006364154 


0.029036620 


0.029071315 




1.75882993 


-0.01464764 


0.02592497 


0.02613434 



Table 7: Results for n = 50, m = 4, h n = 0, 0336, Student's t-distribution with the number 
of degrees of freedom equal to 3 





Estimate 


bias 


Variance 


MSE 


tt(V) 


1.48096844 


-0.14653424 


0.02047265 


0.04194084 


Tj(Van) 
n m,n 


1.46376243 


-0.16374024 


0.01830929 


0.04511650 


tt(Cot) 


1.58556888 


-0.04193379 


0.02135329 


0.02310746 


tt(WG) 


1.623592312 


-0.003910361 


0.020472653 


0.020483849 


H £rn {X) 


1.621348438 


-0.006154234 


0.018522682 


0.018556852 



Table 8: Results for n = 50, m = 4, h n = 0.344, Student's t-distribution with the number 
of degrees of freedom equal to 5 
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Estimate 


bias 


Variance 


MSE 


H (V) 

1J -m,n 


2.51810441 


-0.01291983 


0.09321374 


0.09336202 


jj(Van) 


2.24258072 


-0.28844353 


0.05651327 


0.13970163 


MCor) 
J^in,n 


2.65247722 


0.12145298 


0.09936347 


0.11409442 


rr(WG) 


2.66072829 


0.12970404 


0.09321374 


0.11001824 




2.49016605 


-0.04085819 


0.07746897 


0.07912286 



Table 9: Results for n = 50, m = 4, h n = 0.0235, Cauchy distribution 



settings. According to Huber and Ronchetti (2009): "Most approaches to robustness are 
based on the following intuitive requirement: A discordant small minority should never be 
able to override the evidence of the majority of the observations." In the same reference, it 
is mentioned that resistant statistical procedure, i.e., if the value of the estimate is insen- 
sitive to small changes in the underlying sample, is equivalent to robustness for practical 
purposes in view of Hampel's theorem, refer to (Huber and Ronchetti, 2009, Section 1.2.3 
and Section 2.6). Typical examples for the notion of robustness in the nonparametric set- 
ting are the sample mean and the sample median which are the nonparametric estimates 
of the population mean and median, respectively. Although nonparametric, the sample 
mean is highly sensitive to outliers and therefore for symmetric distribution and contam- 
inated data the sample median is more appropriate to estimate the population mean or 
median, refer to Huber and Ronchetti (2009) for more details. 

In our simulation, we will consider data generated from N(0, 1) distribution where a "small" 
proportion e of observations were replaced by atypical ones generated from a contaminating 
distribution F*(-). We consider two cases with e = 4% and e = 10%, and we choose the 
contaminating distribution F*(-) to be the uniform distribution on [0,1], the results are 
presented in Table 10 with e = 4% and Table 11 with e = 10%. Let /(•) denote the 
density function of N(0, 1) and /*(•) the density of the contaminating distribution F*(-). 
The contaminated sample, X±, . . . ,X n , can be seen as if it has been generated from the 
density 

/«(■) :=(l-6)/(.) + 6/*(0- 

Since the sample is contaminated, all the above estimators tend to H(f e ) and not to H(f). 
The objective here is to obtain the best (in the sense that the corresponding MSE is the 
smallest) estimate of the entropy H(f) from the contaminated data X\, . . . ,X n . We will 
consider the five estimates as above, and we compute their bias, variance and MSE by 
Monte-Carlo using 5000 replications. From Tables 10-11, we can see that our estimator is 
the best one. 

Remark 4.1 To understand well the behavior of the proposed estimator, it will be inter- 
esting to consider several family of distributions: 
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Estimate 


bias 


Variance 


MSE 


H (V) 


1.24299668 


-0.17594185 


0.01115852 


0.04210290 


TjiVan) 


1.27422798 


-0.14471055 


0.01143207 


0.03236179 


rr(Cor) 


1.34170579 


-0.07723274 


0.01160997 


0.01756326 


tt(WG) 


1.38562055 


-0.03331798 


0.01115852 


0.01225745 


H £]n (X) 


1.40066144 


-0.01827709 


0.01054644 


0.01086995 



Table 10: Results for n = 50, m = 4, h n = 0.0333, e = 4%, Normal distribution N(0, 1) 





Estimate 


bias 


Variance 


MSE 


H iy) 


1.22256006 


-0.19637848 


0.01245586 


0.05100791 


Tr(Van) 
J^in,n 


1.24540126 


-0.17353727 


0.01355086 


0.04365249 


rr(Cor) 
i-i-m,n 


1.32178766 


-0.09715087 


0.01243944 


0.02186529 


tt(WG) 


1.36518393 


-0.05375460 


0.01245586 


0.01533296 


H £;n (X) 


1.37585099 


-0.04308754 


0.01219767 


0.01404201 



Table 11: Results for n = 50, m = 4, h n = 0.0333, e = 10%, Normal distribution N(0, 1) 

1. Distribution with support (—00,00) and symmetric. 

2. Distribution with support (—00, 00) and asymmetric. 

3. Distribution with support (0, 00). 
4- Distribution with support ]0, 1[. 

Extensive simulations for these families will be undertaken elsewhere. 

In the following remark, we give a way how the choose the smoothing parameter in practice. 

Remark 4.2 The choice of the smoothing parameter plays an instrumental role in im- 
plementation of practical estimation. We recall that the smoothing parameter h n , in our 
simulations, has been chosen to minimize the MSE of the estimator, assuming that the 
underlying distribution is known. In more general case, without assuming any knowledge 
on the underlying distribution function, one can use, among others, the selection procedure 
proposed in Jones (1992). Jones (1992) derived that the asymptotic MSE ofq n {-), in the 
case of convolution-kernel estimator, which is given by 

AMSE{q n {t)) = ^q" 2 {t)^J x 2 K{x)dx^ 

+ ^r-Q 2 (t) [ K 2 {x)dx. 
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Minimizing the last equation with respect to h n , we find the asymptotically optimal band- 
width for q n (-) as 



h° pt 



q(t) 2 J R K 2 (x)dx " " 



n(q" 2 (t)){j R x^K(x)dxY 
Note that /i° pt depends on the unknown functions 



fW)Y 

and 

_ f"(Q(t))f(Q(t))-mQ(t))? 

q [ ) ~ pm)) 

These functions may be estimated in the classical way, refer to Jones (1992), Silverman 
(1986) and Cheng and Sun (2006) for more details on the subject and the references 
therein. Another way to estimate the optimal value of h n is to use a cross-validation 
type method. 

Remark 4.3 The main problem in using entropy estimates such as in (1.5) is to choose 
properly the smoothing parameter h n . With a lot more effort, we could derive analog results 
here for H £ - n (X) using the methods in Bouzebda and Elhattab (2009, 2010, 2011), as well 
as the modern empirical process tools developed in Einmahl and Mason (2005) in their 
work on uniform in bandwidth consistency of kernel-type estimators. 



5 Test for normality 

A well-known theorem of information theory [see, e.g., p. 55, Shannon (1948)] states that 
among all distributions that possess a density function /(•) and have a given variance <r 2 , 
the entropy H(X) is maximized by the normal distribution. The entropy of the normal 
distribution with variance a 2 is log {a \Z2ne) . As pointed out by Vasicek (1976), this 
property can be used for tests of normality. Towards this aim, one can use the estimate 
H £]n (X) of H(X), as follows. Let X\, . . . , X n be independent random replica? of a random 
variable X with quantile density function q(-). Let H £ . n (X) the estimator of H(X) as in 
(1.5). Let T n denote the statistic 

T n :=log(y^<) +0.5-H £;n (X), (5.1) 
where a 2 is the sample standard deviation based on X 1: . . . , X n , defined by 

i=l 
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where 

n 

X„:=n- 1 ^X J . 

i=l 

The normality hypothesis will be rejected whenever the observed value of T n will be 
significantly less than 0, in the sense we precise below. The exact critical values T a of T n 
at the significance levels a g]0, 1[ are defined by the equation 

P (T n < T a ) = a. 

The distribution of T n under the null hypothesis cannot readily be obtained analytically. 
To evaluate the critical values T a , we have used a Monte Carlo method, for sample sizes 
10 < n < 50 and the significance value given by a = 0.05. Namely, for each n < 50, 
we simulate 20000 samples of size n from the standard normal distribution. Since a = 
0.05 = 1000/20000, we determine the 1000-th order statistic £1000,20000 an d obtain the 
critical value T n .05 through the equation T n .05 — ^1000,20000- Our results are presented in 
Table 12. 



Sample size 




Percentage level 




n 


0.1 


0.05 


0.025 


0.01 


0.005 


35 


0.03660258 


0.05896114 


0.07819581 


0.1041396 


0.1229790 


40 


0.03641781 


0.05732028 


0.07655416 


0.1003001 


0.1177802 


45 


0.03011983 


0.04910612 


0.06554724 


0.0844859 


0.1004404 


50 


0.02534047 


0.04232442 


0.05874272 


0.07830533 


0.09371921 



Table 12: Critical Values of T. 



Park and Park (2003) establish the entropy-based goodness of fit test statistics based 
on the nonparametric distribution functions of the sample entropy and modified sample 
entropy Ebrahimi et al. (1994), and compare their performances for the exponential and 
normal distributions. The authors consider 



i=i x ' 



where 



1 + i=i if l<i<m, 



in 



2 if m + l<i<n — m, 

1 + — if n — m < i < n. 



Yousefzadeh and Arghami (2008) use a new cdf estimator to obtain a nonparametric en- 
tropy estimate and use it for testing exponentially and normality, they introduce the 
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following estimator 

n 



i=i 



v Y 



-^n(^i+m;n) FjiiX-i 



i+m;n ) 



where 



FJx) 



n-l 



n _|_ % — ^0:n _|_ X— Xl ;n 



n(n+l) V n-l X 2 -„-X 0in X 3;n — Xi ;n 



for Xi n < x < 



;n- 



n(n+l) 



n-l 
k ra(n+l) 



n-l X iA 



Xi+2; n —Xi; n 



n — 1 



n— 1 -Xriin — 



-^"n — 2 ; n _j_ -<^n — 1 ; n 



n;n ^-n — 2;n -^n+l;ro ^n — l:n 



for X i;n < x < X i+1;n , 

i = 2,...,n-2, 
for X n _ 1;n < x < X n;n . 



and 



^n+l;n 



X\ n -(X 2 - n — Xi- n ), 

n — 1 



v- 

r) - n 



/7 



n — 1 



(^n;n -^n- 



l;nJ 



To compare the power of the proposed test a simulation was conducted. We used tests 
based on the following entropy estimators: H^ n G \ Hf^, Hm°n anc ^ our test (5-1). In the 
power comparison, we consider the following alternatives. 

• The Weibull distribution with density function 

/(*; A, k) = j exp ((^j J l{x > 0}, 

where k > is the shape parameter and A > is the scale parameter of the distri- 
bution and where !{•} stands for the indicator function of the set {•}. 

• The uniform distribution with density function 

f(x) = l, 0<x<l. 



The Student t-distribution with density function 

r((i/ + i)/2) i i 



f(x;u) 



r(i//2) ^(l + s 2 ///)^ 1 )/ 2 ' 



v > 2, — oo < x < oo. 



Remark 5.1 We mention the value taken in Table 13 for the statistics based on 

and if^^ are £/ie same to that calculated in Table 6, p. 1493 of Yousefzadeh and Arghami 

(2008), for the same alternatives that we consider in our comparison. 
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Statistics based 


on 




Alternatives 


H £ , n (X) 


tt(WO) 


tt(WG) 


jjEbr 

771 , 71 


j^You 

vn, , n 


Uniform 


0.9999 


0.9262 


0.96685 


0.9275 


0.8768 


Weibull(2) 


0.8264 


0.3297 


0.33795 


0.4211 


0.3444 


Student £ 5 


0.9306 


0.1358 


0.05530 


0.1484 


0.2345 


Student t 3 


1.0000 


0.3696 


0.18245 


0.3736 


0.5124 



Table 13: Power estimate of 0.05 tests against alternatives of the normal distribution based 
on 20 000 replications for sample size n=50. 



Alternatives 




K 


Uniform 


K 


= 0.5297 


WeibuU(2) 


K 


= 0.6555 


Student t 5 


K 


= 0.0310 


Student £3 


K 


= 0.0189 



Table 14: Choice of smoothing parameter for H e . n (X). 

In Table 13 we have reported the results of power comparison for the sample size is 50. We 
made 20000 Monte-Carlo simulations to compare the powers of the proposed test against 
4 alternatives. From Table 13, we can see that the proposed test T n shows better power 
than all other statistics for the alternative that we consider. It is natural that our test has 
good performances for unbounded support since the kernel-type estimators behave well in 
this situation. 

6 Concluding remarks and future works 

We have proposed a new estimator of entropy based on the kernel-quantile density esti- 
mator. Simulations show that this estimator behaves better than the other competitors 
both under contamination or not. More precisely, the MSE is consistently smaller than 
the MSE of the spacing-based estimators considered in our study. It will be interesting to 
compare theoretically the power and the robustness of the test of normality, based on the 
proposed estimator of the present paper, with those considered in Esteban et al. (2001). 
The study of entropy in presence of censored data is possible using a similar methodol- 
ogy as presented here. It would be interesting to provide a complete investigation of the 
choice of the parameter h n for kernel difference-type estimator which requires nontrivial 
mathematics, this would go well beyond the scope of the present paper. 

The problems and the methods described here all are inherently univariate. A natural 
and useful multivariate extension is the use of copula function. We propose to extend the 
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results of this paper to the multivariate case in the following way. Consider a Revalued 
random vector X = (Xi, . . . , X d ) with joint cdf 

F(x) := F(zi, ...,x d ):= P(X 1 < x h . . . , X d < x d ) 

and marginal cdf 's 

Fjixj) := P{Xj < Xj ) for j = 1, . . . , d. 

If the marginal distribution functions Fi(-), . . . ,F d (-) are continuous, then, according to 
Sklar's theorem [Sklar (1959)], the copula function C(-), pertaining to F(-), is unique and 

C(u) :=C(u!,...,u d ) :=¥(Q 1 (u 1 ),...,Q d (u d )), for u G [0, l] d , 

where, for j = 1, . . . , d, Qj(u) := inf{x : Fj(x) > u} with u G (0, 1], Qj(0) := lim t | Qj(t) := 
Qj(0 + ), is the quantile function of Fj(-). The differential entropy may represented via den- 
sity copula c(-) and quantile densities qi(-) as follows 

d 

H(X) = J2H(X i ) + H(c), (6.1) 

i=l 

where 



H{Xi) = / log( ft (u))du, (6.2) 

qi(u) = dQi(u)/du, for i — 1, . . . , d and H(c) is the copula entropy defined by 
H(c) = - [ c(F 1 (x 1 ),...,F d (x d ))\og(c(F 1 (x 1 ),...,F d (x d )))dx 

= - c(iti, . . . ,u d ) log (c(mi, . . . ,u d ))du 

J[0,l] d 

c(u) log (c(u))du, (6.3) 



[o,i] d 



where 



d d 

C(ui, ...,Ud) = —C(ui, ...,u d ) 

OU\ . . . ou d 



is the copula density. 



7 Proof. 

This section is devoted to the proofs of our results. The previously defined notation 
continues to be used below. 

Proof of Theorem 2.1. Recall that we set for all e g]0, l/2[, U(e) — [e, 1 — e], and 



H e (X)=e\og(q(e))+ / log (q(x))dx + slog (q(l - e)). 

JU(e) 



'U(e) 
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Therefore, by (1.4), we have the following 



H e;n (X)-H(X) < H E . n (X) - H e {X) + \H e (X)-H(X)\ 

= H £ , n (X)~H £ (X)\+o( V (e)). (7.1) 
We evaluate the first term of the right hand of (7.1). By the triangular inequality, we have 

H £ , n {X) - H £ (X) 



£ (log (q n (e)) - log (q(e))) + / (log (q n (x)) - log (q(x)))dx 

JU(e) 

+e (log (&(l-e))-log (g(l-e))) 
< |e(log(g n (e))-bg(g(e)))| 



+ 



U(e) 



(log (q n (x)) - log (q(x)))dx 



We note that for z > 0, 



Therefore, we have 



\log(q n (x)) - log (g(ar)) 



+ \e (log (g n (l - e)) - log (q(l - e)))\ . 
| log ^| < |z- 1| + \l/z - 1|. 

q n {x)' 



log 



< |gn(a?) - q(x)\ \q n (x) - q(x)\ 



q(x) 



q n {x) 



Under conditions (Q. 1-2-3) and (K. 1-2-3), we may apply Theorem 2.2 in Cheng and Parzen 
(1997), for all fixed e £]0, l/2[ and recall M(q;K n ) defined in Theorem 2.1, we have, as 
n — > oo, 



sup \q n (x) - q(x)\ = O w (n' l/2 M(q; K n ) + n 

x£U(e) V 



■/3 



(7.2) 



we infer that we have, uniformly over x 6 U(e), q n (x) > (l/2)q(x), for all n enough large. 
This fact implies 

H £ ;n{X) - H £ {X) < 4 sup \q n {x) - q{x)\ , 

in probability, which implies that 

H £ ; n (X) - H e (X) | = P (n- l ' 2 M{ q] K n ) + n~ p + 77(e)) . 

Thus the proof is complete. □ 
Proof of Theorem 2.2. Throughout, we will make use of the fact (see e.g. Csorgo and Revesz 
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(1978, 1981)) that we can define X 1; . . . , X n on a probability space which carries a sequence 
of Brownian bridges 

{£„(*) :te [0,1], n> 1}, 



such that, 

Vn(Q n (t) - Q(t)) - q(t)B n (t) = q(t)e n (t) 

and 

n l/2 



lim sup 



A 7 (n) 



g(*)e„(t) < C, 



(7.3) 



(7.4) 



with probability one, where C is a universal constant and 



in) :- 



logn, 



max{g(0), q(l)} < oo or 7 < 2, 



(loglogra)T(logra)( 1+ ^( 1 -^ 7>2, 
with 7 in (Q.2) and an arbitrary positive v. Using Taylor expansion we get, for A e]0, 1[, 

V^(H e;n (X) - H S (X)) 

y/n(\og (q n (x)) - log (q(x)))dx 

U(e) 

+^ne(\og (q n (s)) - log (q(e)) 
+ v / ne(log (q n (l - e )) - log (q(l - e))) 

r ( Qn(x) - q(x) \ . . 

C/(e) V A 9nW + (1 - m\x)J 

=: T n ( £ )+L n ( £ ). (7.5) 

We have, under our assumptions, q n {x) — > q(x) in probability, which implies that \q n {x) + 
(1 — X)q(x) — » g(x) in probability. Thus, we have, by Slutsky's theorem, as n tends to the 
infinity, T n (e) has the same limiting law of 



T n (e) 



{l/q{x))y/n{q n {x) - q(x)) dx. 



(7.6) 



U(e) 



We have the following decomposition 

1'Jc) I (l/q(x))y/n-^- ( [ Q n (v)K n (x,v)dfi n (v) - Q(x) ) dx 



U(e) 



dx 







/ (l/q(x))—(f s/n{Q n {v) -Q(v))K n (x,v)dn n (v)] dx 

Ju(e) dx \J J 

(l/q(x))y/n-^- (^J Q{v)K n (x,v) dji n {v) - Q(x)*j dx 
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Using condition (K.3) and 



we have the following 



(l/q(x))dx < 1, 



U(e) 



(l/q(x))y/n— ( f Q(v)K n (x,v) d/.i n (v) - Q(x) ) dx 



U(e) 

This, in turn, implies, in connection with (7.3), that 
%(e) = 



0(n 



1/2-/3 



{ljq(x))^- ( [ (q(v)B n (v) + q(v)e n (v))K n (x,v) dfi n (v) 



U(e) 

+0{n l ' 2 -P) 



dx 



Recalling the arguments of the proof of Lemma 3.2. of Cheng (1995), for 
I n (u) = [u- 6 n , u + 8 n ) and P n (u) = [0, l]\I n (u), 

we can show that 



(l/q(x)) 



U(e) 
< 



d 

dx 



(l/q(x))dx sup 

U(e) x&U{s) 



q(v)e n (v)K n (x,v)dfi n (v) ) dx 
d ' * 



dx \Jo 



q(v)e n (v)K n (x, v) dfx n (v) 



Then, it follows 



d 



(l/q(x))—[ / q(v)e n (v)K n (x,v)dfi n (v) ) dx 



U{e) dx 

< sup I / q(v) 



e n (v)-^K n (x,v) 



< sup \e n (v)\ / q(y) 

< Cn- 1 / 2 A 7 (n) ( [ q{v) 



K n (x, v) 



d 

dx 

^-K n (x, v) 
dx 



d/jL n (v 

dfJL n {v) 

d/i n (v 







(/ 


^-K n (x, v) 


\Jln(x) 


dx 



dUniv] 



yK n) sup 

x£U(e) Jlfa) 



d 

dx 



q(v)K n (x,v) 



dn n {v) 
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Since q(-) is continuous on ]0, 1[, then it is bounded on compact interval U(e — 5 n ) c]0, 1[, 
which gives, by condition (K.2) 



W(e) < Cn~ 1/2 A 7 (n) sup \q(x)\ 

x&U{e-S n ) 

= Cn- 1/2 A 7 (n)0(l) = o(l) 

= Op(l). 

Let s = 1 + 5 with 5 > arbitrary, and let r = 1 — 1/s. Then Holder's inequality implies 



Ri 2) (e) < Cn~ 1/2 A 7 (n) sup 

xe£/(e) 



[0,1] 



[g(u)] r ^lif„ (ar, u) dnJv) 



l/r 



[0,1] 



i/ s - 



Under condition (K.3), we can see that 



sup 

xeU(e) 



[0,1] 



[q(v)} r -^K n (x, v) dfj, n (v) 



l/r 



< sup 
= 0(n 



q(x) 



[0,1] 



l/r 



This in connection with condition (K.2), implies 



£ 2 >(e) = Cn- 1 / 2 A 7 (n)0(l)0 ( sup ] = o(l) 

= Op(l). 



Hence 

(l/q(x))-^- ( [ q(v)e n (v)K n (x,v)d/j, n (v) ) dx 

'[/(e) dx \Jo 

Then, we conclude that 



Op(l). 



T„(e) 



(l/q(x))-^- ( [ q(v)B n (v)K n (x,v)dii n (v) ) cte 



!7( £ ) 



+op(l). 



We have by Cheng and Parzen (1997), p. 297, (we can refer also to Stadtmuller (1988, 
1986) and Xiang (1994) for related results on smoothed Wiener process and Brownian 
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bridge) 



sup 

xeU(e) 



q(v)B n (v)K n (x, v) dfi n (v) - q(x)B n (x) 



< sup / q(v) \B n (v) - B n (x)\K n (x,v)dii n (v) 
xgu(e) Jo 



+ sup 

x€U(e) 



x£U(e) Jo 

B n {x) \q(v) - q(x)\K n (x,v)d/j ln (v) 
Jo 

= P ((25„log5- 1 ) 1 / 2 )) + Op {Rl/ {1+5) ) + ¥ (n~P) 
= o P (l). 

This gives, under condition (Q.l), 

d 



(l/q(x)) 



U{e) 



dx 



q{v)B n {v)K n (x,v) dn n (v) dx 



l-e 



(l/q(x))[ J q(v)B n (v)K n (x,v) d[i n (v) 

d -(l/q(x)) ) if q(v)B n (v)K n (x,v) d/j, n (v) ) dx 



U{e) 



dx 



(l/q(x))q{x)B n (x) 



l-e 



U(e) 



d_ 

dx' 



(l/q(x)) q{x)B n {x)dx + o P (l) 



= / (q(x)'/q(x))B n (x)dx + B n (l-e)-B n (e) + o P (l). 

Ju{e) 

Making use of (Csorgo and Revesz, 1981, Theorem 1.4.1), for e sufficiently small, 
\B n {l-e)-B{l)\ = o v {l) 

and 

\B n (e)-B(0)\=o ¥ (l). 
Which implies that 



TJe) 



(q (x) /q(x))B n (x)dx + o P (l). 



U(e) 



By (Cheng and Parzen, 1997, Theorem 2.2.), we have, for > 1/2 (/3 in condition (K.3)), 

(7.7) 



sup \/n(log (q n (u)) - \og(q(u))) = o P (l), 

ueU(e) 

which implies, by using Taylor expansion, that 
L n (e) = o P (l), 



{7.1 
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where L„(e) is given in (7.5). This gives 



T n (e) + h n (e) = / (q(x) / q{x))B n {x)dx + o P (l). 
</t/( £ ) 

It follows, 

v^(i/ e;n (X) - H £ (X)) = [ (q'(x)/q(x))B n (x)dx + o P (l). 

JU(e) 

Here and elsewhere we denote by "=" the equality in distribution. Note that 

(q (x) /q(x))B n (x)dx 

U(e) 

is a gaussian random variable with mean 



E / (q (u)/q(u))B n (u)du = / (q (u)/q(u))E(B n (u))du 

\JU(e) J JU(e) 

= o, 

and variance 



E ( / (q {u)/q{u))B n {u)du / (q (v)/q(v))B n (v)dv 

<U(e) JU(e) 

E ( / / (q\u)/q(u))(q\v)/q(v))B n (u)B n (v)dvdv ) 

\Ju(s) JU{e) J 

{q\u)/q{u)){q\v)/q{v))E{B n {u)B n {v))dudv 

U(e) JU(s) 

(q ( u )/q{ u )){q (v)/q(v))(min(u,v) — uv)dudv 

ns)Ju(e) 

(q'(v)/q(v)) ( (1 — v) / (q ' (u) / 'q(u))udu + v / (q (u)/q(u))(l - u)du 

U(e) \ JO Jv 



-elog(g(e))log(g(l -e))+e log 2 (g(e)) - log(g(l - e)) / log(g(rr))dx 
+ff e (X)(log(g(l-e))-ff e (X)) 

/ \og 2 {q{x))dx + £ log 2 (g(e)) + e log 2 (g(l - e)) - A 2 {e) 

JU(e) 

\og 2 (q(x))dx + o($(e)^ - (h(X) + o(j](e) 

= Var {log (<z(Fpf))) } + o(tf(e) + 7j 2 (e)) . 
Thus the proof is complete. 



U(e) 
[0,1] 
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